HW2: exploration

Exploring Los Angeles County crimes data from 2020-2024

Author

Rosemary Juarez

Published

February 28, 2024

# setting my chunk options
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

HW 3 overview:

For HW 3, i plan to go over option 2, which is creating an info-graphic using data I chose and displaying three different charts.

My question for this project is: What is the general crime statistics in LA County?

My question changed from the previous question I had, so now my question is much more broad. The reason it is broad is for one reason:

I’m still looking for what story angle I want to go with, whether its a focus on teen stats, weapon stats, or victim stats. I’m still undecided, so I’m hoping that by HW 4, I will have decided on an angle for storytelling.

Previously, my worries were that i didnt have enough numerical variables. I have since realized that it is okay, as my personal goal of this project is to explore and reveal what story i can uncover from this dataset.

for the variables I plan on using, It will be victim sex/age/race, weapon description, crime description, area name(city), and date. Those 7 variables should give me enough data to uncover multiple stories from here.

So far, the graphs i produced for this data set is bar charts and pie charts. Using primarily tidyverse and ggplot2, I created three bar charts and two pie charts. all the bar charts are counts of either weapon description, crime description, or gun-usage description. I calculated that a hand help gun is the most common gun to use in crimes. Fun fact: it looks like a gun, so hopefully i can polish it up by HW 4 to have that gun-like effect. For my pie charts, I decided to create rings to represent handcuffs. I explore both the race count and area name count to create my pie chart. I have found that Hispanics and 77th street in downtown LA have the most crime reports.

I’m still debating whether to change one of my two charts into a bar chart, as I want to have a bit more variance. However i would rather go with another option of adding another viz to my info-graphic, essentially creating 4 plots.

As for inspiration, I decided to go with two options:

  • tidytuesday Dr. Who viz (link)

    • I really like that they imported images from another app to add to R. I use procreate(drawing app) a lot for my logo designs, so having the option to add some of my designs to my viz in R is a possibility i didnt consider before seeing this example
  • UFO sightings viz (link)

    • I really like how the theme really plays up with aliens, the dark night, and green-tech font. I think the way they managed to get the theme very well is what I hope to really encapsulate with my data viz.

As for my data viz design idea, I wanted to create a detective investigation pinboard, and have it look like it is a part of an investigation. A rough sketch is Showing an investigation pinboard. Pinboard is brown, with three main body parts to represent my data. big title in the top middle says 'LA Crimes'. I also have several large dots to represent more viz to add and red string to show all the data connections to each other.

Challenges I encountered while creating my data viz is my rough draft not looking very good at all. Since I am currently drawing and exporting all the images and colors I will use for my viz from procreate (similar to the Dr. Who viz), My plots will look plain. Im hoping however that by HW 4, my info-graphic should be completed and look much better.

Packages I will be using for my project:

#using these three packages
library(tidyverse) #main use for data wrangling
library(janitor) #helps with clean names for my variables
library(lubridate) #need this for my time data. Mostly for wrangling time data
library(stringr) #this helps with dealing with strings and characters in my data
library(magick) #this calls in images
library(gridExtra) #organizing my plot
library(patchwork) #organizing my plots even better
library(sysfonts)

font_add_google(name = "Special Elite") #for the typewriter font
font_add_google(name = "Nosifer") #for the bloody font

reading in my CSV on LA Crimes from 2020 - Present (Jan 18 2024)

#reading in my data from my local computer
la_crimes <- read_csv("C:/Users/rosem/Documents/MEDS/Courses/EDS-240/HW/Juarez-eds240-HW4/data/Crime_Data_from_2020_to_Present_20240131.csv") %>% 
  clean_names()

Data wrangling portion

#==============================================================
#                 data wrangling
# =============================================================

#will create a new column that describes the main 5 race categories. for reference:
#c(B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian)

asian_countries <-  c('A', 'C', 'D', 'F',"L", 'J', 'K', 'V', 'Z')


# Define a named vector mapping the current categories to their full names
race_names <- c(
  "A" = "Asian",
  "B" = "Black",
  "W" = "White",
  "H" = "Hispanic",
  "I" = "Native American/Alaska",
  "P" = "pacific Islander"
)

#------------------------
#   regualar wrangling
# -----------------------

#creating a cleaned-up version of la_crimes. keeping the name so that i have less names to remember

la_crimes <- la_crimes %>% 
  #removing zeros in the `vict_age` column, as 0, -1, and -2 indicated that no age was recorded.
  filter(vict_age > 0 ) %>% 
  #I want to incllude all values for victim sex, however to first test out my plots, i want to view just male and female for simplicity
  #
  #
  filter(vict_sex %in% c('M', 'F')) %>% 
  #Asian countries will be agreggated to one.
  


#-----------------------------
#   asian country aggregation
#-----------------------------

  #Asian countries will be aggregated to one. using case_when as it will help with selecting and reassigning asian countries to the letter "A" if the list of values i provided above are within asian_countries
  mutate(race_category = case_when(
    vict_descent %in% asian_countries ~ "A",
    TRUE ~ vict_descent  # Keep non-Asian races unchanged, as true will allow for the row that do not have an asian country to remain the same within the new "race_category" column.
  )) %>% 

  #filter for the top 6 race categories
  filter(race_category %in% c('B', 'H', 'W', 'I', 'P', 'A')) %>% 

# Rename the categories in the race column
  mutate(race = case_match(race_category, "B" ~ "Black",
                           "H" ~ "Hispanic",
                           "W"~ "White",
                           "I" ~ "Native American/Alaska",
                           "P" ~ "Pacific Islanders",
                           "A" ~ "asian"
                           ))
#remove time component
la_crimes <- la_crimes %>% 
  mutate(date_occ = as.Date(mdy_hms(date_occ)))

la_crimes <- la_crimes %>% 
  mutate(month = month(date_occ))

la_crimes$month <- as.factor(la_crimes$month)
la_crimes$vict_age <- as.factor(la_crimes$vict_age)



la_crimes <- la_crimes %>% 
   mutate(month = factor(month,
                                levels = c(1,2,3,4,5,6,7,8,9,10,11,12),
                                labels = c('January', 'February', 'March', 'April','May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'))
)

min(la_crimes$date_occ)
[1] "2020-01-01"
max(la_crimes$date_occ)
[1] "2024-01-22"
#====================================
#     new data frames for plotting
#====================================

#creating crime description
crime_desc <- la_crimes %>% 
  group_by(crm_cd_desc) %>%
  summarise(count = n()) %>%
  arrange((desc(count)), .by_group = TRUE) %>% 
  ungroup()


#crime description by sex
crime_desc_sex <- la_crimes %>% 
  group_by(crm_cd_desc, vict_sex) %>%
  summarise(count = n()) %>% 
  slice_max(order_by = count, n = 10) %>% 
  group_by(crm_cd_desc) %>% 
  ungroup()

#creating weapon description
weap_desc <- la_crimes %>% 
  group_by(weapon_desc) %>%
  summarise(count = n()) %>%
  arrange((desc(count)), .by_group = TRUE) %>% 
  na.omit() %>% 

#filtering those that are unknown or not physical

  filter(!grepl("UNKNOWN", weapon_desc)) %>% 
  filter(!grepl("OTHER", weapon_desc)) %>% 
  filter(!grepl("VERBAL", weapon_desc))

#of those weapons, which ones are guns?
weap_gun <- weap_desc %>% 
  filter(grepl("GUN", weapon_desc) | grepl("PISTOL", weapon_desc) | grepl("RIFLE", weapon_desc))

Now we create our data viz

#this one visualizes he Top 10 Most Common Crime Description Reported
top_10_bar <- crime_desc %>% 
  slice(1:10)%>% 
  ggplot( aes(x = fct_reorder(crm_cd_desc, count), y = count)) +
  geom_col(fill = "midnightblue") +
  geom_text(aes(label = count), hjust = 1.2, color = "white", face = "bold", size = 8) +
  coord_flip() +
  theme_classic()+
  labs(title = "Top 10 Most Common Crime Description Reported"
       ) +
  theme(axis.title.y = element_blank(),
        axis.line.y = element_blank(), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.line.x = element_blank(), 
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(family = "mono",
                                  size = 15, 
                                  color = "red4",
                                  face = "bold")
        )
top_10_bar

#here, i am looking for the top 5 weapons filed during police reports:
top_5_weap <- weap_desc %>% 
  slice(1:5)%>% 
  ggplot( aes(x = fct_reorder(weapon_desc, count), y = count)) +
  geom_col(fill = "midnightblue") +
  geom_text(aes(label = count), hjust = -.2) +
  coord_flip() +
  theme_classic()+
  labs(title = "Top 5 weapons"
       
       ) +
  scale_y_continuous(limits = c(0, 200000)) +
  theme(axis.title.y = element_blank(),
        axis.line.y = element_blank(), 
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.line.x = element_blank(), 
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(size = 25, family = "mono", face = "bold", color = "red4"),
        plot.background = element_blank(),
        panel.background = element_blank()
        
        
        )
top_5_weap

ggsave(filename =  "~/../Downloads/weap.png", top_5_weap, width = 6, height = 3)
#I want to record the top 5 crime reports involving guns
weap_graph <- weap_gun %>% 
  slice(1:5)%>% 
  ggplot( aes(x = fct_reorder(weapon_desc, count), y = count)) +
  geom_col(fill = "black") +
  geom_text(aes(label = count),family = "Special Elite", hjust = -.2, color = "red4", size = 9) +
  coord_flip() +
  theme_classic()+
  labs(title = "Crime Reports involving Firearms",
       subtitle = "From 2020-2024, There were 589,000 individual crime reports. From those reports, 24,000 involve firearm weapons."
       ) +
  scale_y_continuous(limits = c(0, 20000)) +
  theme(axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.line.y = element_blank(), 
        axis.line.x = element_blank(),
        axis.ticks.y = element_blank(),
        axis.ticks.x = element_blank(),
        plot.title = element_text(family = "Nosifer",
                                  size = 39, 
                                  color = "red4",
                                  hjust = 1),
        plot.subtitle = element_text(family = "Special Elite",
                                     size = 17,
                                     hjust = .95),
        axis.text = element_text(family = "Special Elite", 
                                 size = 12),
        axis.title = element_text(family = "Special Elite"),
        plot.margin = margin(1,.5,.5,.5, "cm"),
        plot.background = element_rect(fill = "#dbb69c"),
        panel.background = element_rect(fill = "#dbb69c")
        ) 


ggsave(filename =  "~/../Downloads/gun_ex.png", weap_graph, width = 12, height = 6)

weap_graph

# I am reporting on the top 5 race victims in the crime data
race_pie <- la_crimes %>%
  count(race) %>%
  ggplot() +
  ggforce::geom_arc_bar(
    aes(x0 = 0, y0 = 0, r0 = 0.8, r = 1, amount = n, fill = race),
   stat = "pie") +
  theme_void()+
  scale_fill_manual(values= c("#582851FF", "#40606DFF", "#69A257FF", "#E3D19CFF", "#C4024DFF", "orange")) + 
  labs(fill = "Top Victims by race") +
  theme(
    legend.text = element_text(size=25, family = "Special Elite"),
    legend.title = element_text(face="bold", size=25, family = "Special Elite", hjust = .3),
    legend.key.size = unit(0.35, 'cm'),
    legend.position = c(0.75,0.35),
    legend.justification = c(1, 0),
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    panel.grid.major.y = element_blank(),
    axis.line = element_blank(),
    plot.background = element_blank(),
    
    )
ggsave(filename =  "~/../Downloads/race_pie.png", race_pie, width = 8, height = 6)

race_pie

#I want to create a donut chart of the top 5 crime locations
area_pie <- la_crimes %>%
  count(area_name) %>%
  slice(1:5) %>% 
  ggplot() +
  ggforce::geom_arc_bar(
    aes(x0 = 0, y0 = 0, r0 = 0.8, r = 1, amount = n, fill = area_name, explode = ), #,explode = focus
   stat = "pie") +
  theme_void()+
  scale_fill_manual(values = c("#582851FF", "#40606DFF", "#69A257FF", "#E3D19CFF", "#C4024DFF")) +  
  labs(fill = "Top 5 Crime locations") +
  theme(
    legend.text = element_text(size=25, family = "Special Elite"),
    legend.title = element_text(face="bold", size=25, family = "Special Elite"),
    legend.key.size = unit(0.35, 'cm'),
    legend.position = c(0.75,0.35),
    legend.justification = c(1, 0),
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    panel.grid.major.y = element_blank(),
    axis.line = element_blank()
    )
ggsave(filename =  "~/../Downloads/area_pie.png", area_pie, width = 8, height = 6)

area_pie

Final plot draft

This is my draft for my infographic. I would still like to polish or my weapon bar chart visual. I also need to still add some more decoration and touches that will make it even better. Showing an investigation pinboard. Pinboard is brown, with three main body parts to represent my data. big title in the top middle says 'LA Crimes'. I have three visualizations that explores my dataset.